Tree Annotation Tool using Two-phase Parsing to Reduce Manual Effort for Building a Treebank
نویسندگان
چکیده
In this paper, we propose a tree annotation tool using a parser in order to build a treebank. For the purpose of minimizing manual effort without any modification of the parser, it performs twophase parsing for the intra-structure of each segment and the inter-structure after segmenting a sentence. Experimental results show that it can reduce manual effort about 24.5% as compared with a tree annotation tool without segmentation because an annotation’s intervention related to cancellation and reconstruction remarkably decrease although it requires the annotator to segment some long sentence.
منابع مشابه
Interactive Predictive Parsing Framework for the Spanish Language
The Interactive Predictive Parsing (IPP) framework allows us the construction of interactive tree annotation systems. These can help human annotators in creating error-free parse trees with little effort (compared to manually post-editing the trees obtained from a completely automatic parser). In this paper we adapt the IPP framework and the IPP-Ann annotation tool for parse of the Spanish lang...
متن کاملTransformed Subcategorization Frames in Chunk Parsing
This paper describes an approach to treebank development which relies on the manual development of annotation tools. The overall process of tree annotation is described, and a special emphasis is put on the description of the last tool which has been built, i.e. a dependency-based robust chunk parser. The modularization of the parser and the central role of verbal subcategorization is presented...
متن کاملTamilTB: An Effort Towards Building a Dependency Treebank for Tamil
Annotated corpora such as treebanks are important for the development of parsers, language applications as well as understanding of the language itself. Only very few languages possess these scarce resources. In this paper, we describe our effort in syntactically annotating a small corpora (600 sentences) of Tamil language. Our annotation is similar to Prague Dependency Treebank (PDT 2.0) and c...
متن کاملA Machine Learning Approach to Automatic Functor Assignment in the Prague Dependency Treebank
The aim of this paper is to describe and evaluate a system that automates a part of the transition from analytical to tectogrammatical tree structures within the Prague Dependency Treebank. In particular, it assigns functors to autosemantic words. The system is based on the machine learning approach of decision tree induction. The resulting software tool is incorporated into the annotation proc...
متن کاملUtilizing State-of-the-art Parsers to Diagnose Problems in Treebank Annotation for a Less Resourced Language
The recent success of statistical parsing methods has made treebanks become important resources for building good parsers. However, constructing highquality annotated treebanks is a challenging task. We utilized two publicly available parsers, Berkeley and MST parsers, for feedback on improving the quality of part-of-speech tagging for the Vietnamese Treebank. Analysis of the treebank and parsi...
متن کامل